Enriching the WordNet Taxonomy with Contextual Knowledge Acquired from Text
نویسندگان
چکیده
This paper presents a possible solution for the problem of integrating contextual knowledge in the WordNet database. Contextual structures are derived from three sources: (1) minimal contexts-in the form of semantic nets transformations of WordNet glosses; (2) dynamic contexts rendered by webs of lexico-semantic paths revealing textual implied information and (3) static contexts-represented by patterns of concepts and semantic links. The relevance of these structures is measured on a three-tired benchmark, comprising (a) word-sense disambiguation; (b) coreference resolution and (c) acquisition of domain patterns for information extraction. 1 The Basic Idea Recently, a new version of the WordNet lexical database Miller, 1995] developed at Princeton has become publicly available (www.cogsci.princeton.edu/~wn). WordNet 1.6 contains 126,520 English words grouped into 91,595 synonym sets, called synsets. Words and synsets are entangled by 391,885 lexico-semantic relations, making WordNet a useful resource for natural language processing systems. WordNet has recently been used in conjunction with annotated corpora (like Treebank Marcus et al., 1993]) for applications such as word-sense disambiguation Ng and Lee, 1996], information extraction Bagga et al., 1997], text summarization Robin and McKeown, 1995], conversational implicature Harabagiu et al., 1996], and probabilistic WWWeb search engines similar to those presented in Ackerman et al., 1997]. Most of these applications rely implicitly on linguistic and/or discourse contexts, and the integration of contextual objects in the WordNet taxonomy is beneecial and can lead to novel, more performant processing techniques. WordNet covers the majority of English nouns, verbs, adjectives and adverbs, but it implements only fourteen types of lexico-semantic relations, thus providing with a small connectivity between nodes, desired to be enriched. The meaning of each synset of WordNet 1.6 is deened by a textual gloss, which can be considered also as a minimal contextual deenition. Contextual representations in lexical databases have been considered before as important indicators of word senses in WordNet Miller and Charles, 1991]. The problem was to nd an empirical solution to the representation problem. Furthermore, information about the context in which a concept is used brings knowledge about the world, transforming the lexicon into an approximation of common-sense knowledge. The codiication of human knowledge using contextual representations was also attempted in
منابع مشابه
The Impact of Contextual Clue Selection on Inference
Linguistic information can be conveyed in the form of speech and written text, but it is the content of the message that is ultimately essential for higher-level processes in language comprehension, such as making inferences and associations between text information and knowledge about the world. Linguistically, inference is the shovel that allows receivers to dig meaning out from the text with...
متن کاملDemo: Enriching Text with RDF/OWL Encoded Senses
This demo paper describes an extension of the Enrycher text enhancement system, which annotates words in context, from a text fragment, with RDF/OWL encoded senses from WordNet and OpenCyc. The extension is based on a general purpose disambiguation algorithm which takes advantage of the structure and/or content of knowledge resources, reaching state-of-the-art performance when compared to other...
متن کاملAutomatic Enrichment of WordNet with Common-Sense Knowledge
WordNet represents a cornerstone in the Computational Linguistics field, linking words to meanings (or senses) through a taxonomical representation of synsets, i.e., clusters of words with an equivalent meaning in a specific context often described by few definitions (or glosses) and examples. Most of the approaches to the Word Sense Disambiguation task fully rely on these short texts as a sour...
متن کاملUnsupervised Knowledge Extraction for Taxonomies of Concepts from Wikipedia
A novel method for unsupervised acquisition of knowledge for taxonomies of concepts from raw Wikipedia text is presented. We assume that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The concepts in 6 taxonomies extracted from WordNet are mapped onto Wikipedia pages and the lexico-syntactic patterns describing semantic structures expre...
متن کاملEnriching semantic knowledge bases for opinion mining in big data applications
This paper presents a novel method for contextualizing and enriching large semantic knowledge bases for opinion mining with a focus on Web intelligence platforms and other high-throughput big data applications. The method is not only applicable to traditional sentiment lexicons, but also to more comprehensive, multi-dimensional affective resources such as SenticNet. It comprises the following s...
متن کامل